Proposal on Clarification and Consolidation of the Function of ZERO WIDTH JOINER in Indic Scripts

نویسنده

  • Peter Constable
چکیده

One problem with this arrangement is that the Indic scripts are not all the same; in fact, there are some very significant differences between scripts. Two particular problems resulting from differences are that it is not clear how certain encoding formalisms specified in section 9.1 are to be applied in other Indic scripts, and that there are common problems found in other scripts that are not addressed in the section on Devanagari.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Proposal on Handling Reph in Gurmukhi and Telugu Scripts

Chapter 9 of the Unicode standard [1] describes the representational model for encoding Indic scripts. Devanagari is described in Section 9.1; the principles of Indic scripts are covered in some detail in the introduction to Devanagari. The descriptions of the remaining Indic scripts were abbreviated highlighting any di erences from Devanagari where appropriate. Some of the problems in this des...

متن کامل

Comparison of Visual and Logical Character Segmentation in Tesseract OCR Language Data for Indic Writing Scripts

Language data for the Tesseract OCR system currently supports recognition of a number of languages written in Indic writing scripts. An initial study is described to create comparable data for Tesseract training and evaluation based on two approaches to character segmentation of Indic scripts; logical vs. visual. Results indicate further investigation of visual based character segmentation lang...

متن کامل

Cross-language Framework for Word Recognition and Spotting of Indic Scripts

Handwritten word recognition and spotting of low-resource scripts are difficult as sufficient training data is not available and it is often expensive for collecting data of such scripts. This paper presents a novel cross language platform for handwritten word recognition and spotting for such low-resource scripts where training is performed with a sufficiently large dataset of an available scr...

متن کامل

Indica, an Indic preprocessor for TEX A Sinhalese TEX System

In this paper a two-fold project is described: the first part is a generalized preprocessor for Indic scripts (scripts of languages currently spoken in India—except Urdu—, Sanskrit and Tibetan), with several kinds of input (LTEX commands, 7-bit ascii, CSX, ISO/IEC 10646/unicode) and TEX output. This utility is written in standard Flex (the gnu version of Lex), and hence can be painlessly compil...

متن کامل

An Overview of Indic Fonts for T E X

Many scholars and students in the humanities have preferred TEX over other “word processors” or document preparation systems because of the ease TEX provides them in typesetting non-Roman scripts, the availability of TEX fonts of interest to them, and the ability TEX has in producing well-structured documents. However, this is not the case amongst Indologists. The lack of Indic fonts for TEX an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004